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This article pursues a statistical study of the Hough transform, 
the celebrated computer vision algorithm used to detect the presence 
of lines in a noisy image. We first study asymptotic properties of the 
Hough transform estimator, whose objective is to find the line that 
“best” hts a set of planar points. In particular, we establish strong 
consistency and rates of convergence, and characterize the limiting 
distribution of the Hough transform estimator. While the conver¬ 
gence rates are seen to be slower than those found in some standard 
regression methods, the Hough transform estimator is shown to be 
more robust as measured by its breakdown point. We next study the 
Hough transform in the context of the problem of detecting multiple 
lines. This is addressed via the framework of excess mass function¬ 
als and modality testing. Throughout, several numerical examples 
help illustrate various properties of the estimator. Relations between 
the Hough transform and more mainstream statistical paradigms and 
methods are discussed as well. 

1. Introduction. The Hough transform (HT), due to Hough (1959), is 
one of the most frequently used algorithms in image analysis and computer 
vision [see, e.g., Ritter and Wilson (1996) and the survey articles by Leavers 
(1993) and Stewart (1999)]. The algorithm is most often used to detect 
and estimate parameters of multiple lines that are present in a noisy image 
(typically the image is first edge-detected and the resulting data serve as 
input to the algorithm). 

In the particular case where only one line is present, the algorithm shares 
the same objective as simple linear regression, namely, estimating the slope 
and intercept of the line. While inference using regression methods is well 
understood, the statistical properties of the HT approach have not been 
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Fig. 1. An illustration of the Hough transform: (a) the original scatterplot; (b) the Hough 
domain (dual plot). 

studied thoroughly. Most studies have focused almost exclusively on algo¬ 
rithmic and implementation aspects [for a comprehensive survey see, e.g., 
Leavers (1993)], while few articles pursue a statistical formulation [see, e.g., 
Kiryati and Bruckstein (1992) and Princen, Illingworth and Kittler (1994)]. 

The basic idea of the HT can be informally described as follows. Consider 
a set of planar points {(Xj, 1^)}”^]^ depicted in Figure 1(a). The objective is 
to infer the parameters of the line that fits the data in the “best” manner. 
The key to the HT algorithm is to view each point as generating a line which 
is comprised of all pairs (slope, intercept) that are consistent with this point. 
Specifically, for the ith point this line is given by Lj = {(a, 6): = aXi + b}. 

The set of random lines is plotted in the Hough domain, depicted 

in Figure 1(b). In the statistical literature this domain is referred to as the 
dual plot. Thus, co-linearity in the original set of points will manifest itself 
in a common intersection of lines in the dual plot. 

In practice, the HT algorithm is implemented as follows. The Hough do¬ 
main is first quantized into cells, and each such cell maintains a count of 
the number of lines that intersect it. The cell with the largest number of 
counts is the obvious estimator of the parameters of the original line. If one 
is focusing on detecting multiple lines, a threshold is specified and those cells 
with counts exceeding the threshold indicate the presence (and parametriza- 
tion) of lines in the original image. A polar parametrization of the lines is 
also used in practical implementations, resulting in sinusoidal curves in the 
Hough domain [see, e.g., Ritter and Wilson (1996)]. 

The goal of this article is to provide analysis that formalizes and elucidates 
statistical properties of the HT methodology. The main contributions of this 
article are the following: 

(i) We establish almost sure consistency of the HT estimator (Theo¬ 
rem 1), determine the rate of convergence and characterize the limiting 
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distribution (Theorem 2). The estimator is shown to have cube-root asymp¬ 
totics [see, e.g., Kim and Pollard (1990)]. 

(ii) Robust properties of the HT estimator are derived. In particular, the 
breakdown point is determined (Theorem 3) and it is shown that this point 
can be made to be arbitrarily close to 50%. The theory is illustrated via a 
standard example. 

(hi) We illustrate the effects of design parameters of the HT estimator on 
its performance via a simulation study. 

(iv) We relate the multiple line detection problem to multi-modality test¬ 
ing in the Hough domain. In particular, asymptotic behavior of empirical 
excess mass functionals (Theorem 4) provides the building block by which 
one can pursue a test for the presence of multiple lines. 

While a study focusing on the statistical properties of the HT is lacking in 
the literature, several strands of statistics-related research are akin to the HT 
approach. The concept of the dual plot has appeared already in early work of 
Daniels (1954), and in more recent work of Johnstone and Velleman (1985) 
and Rousseeuw and Hubert (1999). As we shall see in what follows, the 
HT estimator is closely related to regression methods such as least median 
of squares of Rousseeuw (1984), and S'-estimators studied in Rousseeuw 
and Yohai (1984) and Davies (1990). Finally, the multiple line detection 
problem is intimately related to multi-modality testing using excess mass 
[see, e.g., Hartigan (1987), Muller and Sawitzki (1991) and Polonik (1995)]. 
The basic problem of estimating the location of a single mode studied by 
Chernoff (1964) can also be viewed as a one-dimensional application of the 
HT algorithm. Further details concerning some of these relations are given 
in the sequel. 

The article has two main focal points: the first three sections, namely. 
Sections 2-4, focus on the HT estimator, while the subsequent Section 5 
discusses testing of multiple lines. Section 2 describes the precise formulation 
of the HT estimator, while Section 3 studies large sample properties of the 
HT estimator (Section 3.1) and robustness (Section 3.2). Section 4 then 
focuses on some issues concerned with the design of the estimator, effects 
of the variates and relation of the method to other statistical approaches. 
The problem of testing for multiple lines is the subject of Section 5. Finally, 
Section 6 contains several concluding remarks. Proofs are collected in two 
appendices; Appendix A gives the proofs related to the properties of the HT 
estimator, while Appendix B contains the proofs related to the multiple line 
testing problem. 

2. Definition of the HT estimator. Let data points (Ai, Yi),..., (A^, Yn) 
be given on the plane. Each observation pair (Aj,17) defines a straight line 
in the Hough domain: 

Lj: 6 = — AjO -|- Yi, 


i = 1,..., n. 
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For a positive number r, let By.{6) denote the disc of radius r centered at 
6 = (a, 6 ). We are looking for a point 6 = (a, 6 ) in the Hough domain such 
that the maximal number of lines Li cross over the disc Br{9). More formally, 
the HT estimator 9r^n maximizes the objective function 

1 "■ 

Mr,n{9) :=-^l{H^( 0 )nLi 7 ^ 0 } 

with respect to 9 = (a, b). Note that LiriBr{9) 7 ^ 0 if and only if the distance 
between the line Li and the disc center 9 = (a, b) is less than or equal to r. 
Thus, Mr^n{9) takes the following form: 

1 " 

(1) Mr,n{9) = -J2l{\Xia + b- Yi\^ < r\Xf + 1)}, 

and the HT estimator is defined by 

1 ” 

(2) 9r,n = argmax - l{|Wa + 5 - Yi\^ <r‘^{Xf + 1)}. 

0 gr 2 n ^ 

1=1 

Hence, 9r,n can be regarded as an M-estimator associated with the objective 
function Mr^n{')- Note that usually the above maximum is not unique; any 
point of the solution set may be chosen as 9r,n- Note also that the above 





Fig. 2. Template of the HT estimator. 
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definition of the HT estimator depends on the design parameter r. Denote 
by 

(3) Mr{e) := = P{|Xa + h-Y\^ < r^(X^ + 1)} 

the deterministic counterpart of 

The HT estimator admits the following geometrical interpretation. Let 

(4) De = {(x,y): |xa + 6 - yp < + 1)}, 6' = (a, 6 )gK^. 

For given 6, Dq is the set of all points of the plane lying between two branches 
of a hyperbola that has straight lines y = {a — r)x + b and y = (a + r)x + h 
as its asymptotes; see Figure 2. Hence, the HT estimator given by (2) seeks 
the value 9 such that the corresponding set Dq covers the maximal number 
of data points. The set Dq defines the so-called template of the HT in the 
observation space [e.g., Princen, Illingworth and Kittler (1992)]. We note 
that the template shape is determined by the choice of the cell shape, which 
is a disc of radius r in our case. Various estimators may be defined using other 
cell shapes; the rectangular cell is most natural. However, the difference in 
properties of these estimators is marginal. 

3. Properties of the HT estimator. Asymptotic properties of the HT es¬ 
timator are studied under the following assumptions. Suppose that (Vi, Yi),..., {Xn,Yn) 
are independent identically distributed random observations drawn from 
the model 

(5) Y = uqX -|- 6o T C) 


where: 

(a) X is independent of e, and 

(b) e is a random variable with bounded, symmetric and strictly uni- 
modal density, f{x) = f{—x) Vx. 

By strict unimodality we mean that density / has a maximum at a unique 
point, X = 0, and decreases in either direction as x decreases or increases 
away from zero. 

Let denote the empirical measure of a sample of the pairs {Xi,Yi), 
i = 1,. .. ,n, and P be the common distribution of (Aj, Yi). Then the objective 
function Mr^n{G) in (1) and its deterministic counterpart, Mr{9), can be 
written as Mr^n{S) = Fn{DQ) and Mr{9) = F{Dq), where Dq is defined by (4). 

3.1. Asymptotics. We are interested in the asymptotic behavior of 9r,n 
as n —> oo. The first theorem establishes consistency. 
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Theorem 1. Under assumptions (a) and (b), for any fixed r > 0 the 
estimator 9r,n is strongly consistent: 

do as oo, where Oq = {ao, bo). 

It is interesting to note that the consistency proof does not require ex¬ 
istence of the expectation of the noise e. For example, the noise may be a 
sequence of i.i.d. Cauchy random variables. The next theorem establishes 
the asymptotic distribution of the centered and scaled estimator. 

Theorem 2. Let f be continuously differentiable with bounded first deriva¬ 
tive, and let assumptions (a) and (b) hold. Assume that X is a nondegener¬ 
ate random variable with finite second moment. Then for every fixed r > 0, 

— do) ^ bF? where W has the distribution of the (almost surely 
unique) maximizer of the process d e-> ^9"’"Vod G{9), 

(6) Vo = n[nr\\Z\\)-f'{-r\\Z\\)]ZZ^}, 

Z = {X, 1)^, and G is a zero-mean Gaussian process with continuous sample 
paths and stationary increments such that for any f,,r] G 

(7) E[G(0 - G{rj)f = 2E{f{r\\Z\\)\Z^{^ - 7?)|}. 

The cube-root rates of convergence are due to the discontinuous nature 
of the objective function Mr^ni')- The most general results dealing with this 
type of asymptotics are given in Kim and Pollard (1990); see also van der 
Vaart and Wellner [(1996), Chapter 3]. Clearly the asymptotic distribution 
above is quite complicated. The one-dimensional instance, where G{-) is a 
Brownian motion, was first studied in Chernoff (1964) [see also, Groeneboom 
(1989) and Groenenboom and Wellner (2001)]. 

3.2. Robustness. One way to characterize the robustness of an estima¬ 
tor is through its breakdown properties. Intuitively, the breakdown point is 
the smallest amount of “contamination” necessary to “upset” an estimator 
entirely. We use the formal definition of the finite-sample breakdown point 
given by Donoho and Huber (1983). Let Tn = {(-^ij hi),..., (X„, l)i)} and 
d = 0(Tn) be an estimator based on Tn- Consider an additional data set 
of size k. If by choice of one can make 0(Tn U Tfc) “ d(yn) arbitrarily 
large, we say that 9 breaks down under contamination fraction k/{n-\-k). 
The hnite-sample addition breakdown point eadd(^;Tn) is the minimal con¬ 
tamination fraction under which 9 breaks down: 

^ k 


eadd(6';Tn) =min 
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Similarly, the finite-sample replacement breakdown point of 9 is defined by 
erep(0;3^n) = mini - : SUp ||(9(J^) - 0(3^n)|| = OO 

Ln yk 

where denotes the corrupted sample obtained from by replacing k 
data points of 3^^ with arbitrary values. The following theorem gives the 
breakdown properties of the HT estimator 9r^n- 


Theorem 3. Let Tn = ..., he a sample with no re¬ 

peated values of X. Then 


^add (^) Tn 


[nMr^ni9r,n)\ - 1 
n+ [nMr,n{9r,n)\ “ l’ 


erep(0;Tn) = 


n 


nMi-^ni9r,ri) 


Moreover, if the conditions of Theorem 1 hold, and the distribution of X is 
eontinuous, then, asn^oo. 


eadd(^r,n;Tn)'^'p(l+p) \ erep(^r,n; Tn)'^■p/2, 

where p = P{e^ < r^||Z|p}. 


We now turn to several remarks concerning the theorem. First, the as¬ 
sumption that the sample Tn does not contain repeated observations of X 
rules out parallel lines in the Hough domain. This assumption is quite typical 
in the context of the regression methods utilizing the dual plot approach [see, 
e.g., Daniels (1954)]. Second, the value of r controls breakdown properties 
of the HT estimator: the larger r, the closer the breakdown point is to 1/2. 
For example, if r is chosen to be the (1 — a)-quantile of the distribution 
of e^||Z||“^, the addition breakdown point of the corresponding estimate is 
(1 — q ;)/(2 — a) and the replacement breakdown point is (1 — a)/2. 

To illustrate the breakdown properties of the HT estimator, we consider 
a numerical example given in Rousseeuw (1984). The sample containing 
30 “good” observations is generated from the model Yi = Xi + 2 + Ci, where 
€i are Gaussian random variables with zero mean and standard deviation 0.2, 
and Xi are uniformly distributed on [1,4]. Then a cluster of 20 “bad” obser¬ 
vations is added. These observations follow a bivariate Gaussian distribution 
with expectation (7,2) and covariance matrix 0.25/. Figure 3 displays the 
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data along with the least squares (LS), least median of squares (LMS) and 
the HT estimates. The LMS estimator is defined as the value of the parame¬ 
ter 9 = (a, b) that minimizes the mediani<j<„ \Yi — aXi — [see Rousseeuw 
(1984)]. The parameter r of the HT estimator is set to 0.15. Under con¬ 
ditions of the experiment F{e^{X‘^ -|-l)“^<0.15^}f« 0.923, which approxi¬ 
mately corresponds to a 46% replacement breakdown point. The HT estima¬ 
tor is calculated by direct maximization of (2) on the square [—3,3] x [—3,3] 
using a uniform rectangular grid comprised of 250,000 points. Because the 
solution is not unique, the average of the grid points where the maximum 
is achieved is taken as the estimate. Thus, the HT estimate yields a = 0.917 
and 6 = 2.173, which is quite close to the original values oq = 1 and bo = 2. 
In general, behavior of the HT estimate in this example is very similar to 
that of the LMS. 

4. Discussion. 

4.1. Choice of the radius r. The properties of the HT estimator depend 
on the choice of a parameter r. The results of Section 3 assert that the HT 
estimator is consistent for any choice of r, and the asymptotic distribution 
is given in Theorem 2. Thus, a reasonable choice of r would be the value 
minimizing the variance of the limiting random variable in Theorem 2. Un¬ 
fortunately, the asymptotic distribution is not tractable, and we cannot use 



X 


Fig. 3. An illustration of the breakdown properties of the HT estimator. The data set 
consists of 30 observations from the underlying linear regression model and 20 ‘Tad" data 
points (the cluster on the right). 
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it as a basis to make a choice of r. Clearly, large values of r lead to a large 
connected solution set, and in this case the estimation accuracy depends 
crucially on the way the estimator is chosen from the solution set. On the 
other hand, small values of r lead to an “under-smoothed” dual plot, and 
the solution set is a union of many disconnected sets. In this case estimation 
accuracy of the average estimator may be very poor. 

To study how estimation accuracy depends on r, we conducted the follow¬ 
ing simulation experiment. For sample sizes n = 25,50,100 we generate data 
sets from the model Yi = Xi + 2 + €i, where e* are Gaussian random variables 
with zero mean and standard deviation 0.5, and Xi are uniformly distributed 
on [—2,2]. The HT estimator is computed for different values of r. In our 
implementation we used the square [—3,3] x [—3,3] as the search region. The 
value of the objective function is computed at nodes of the regular grid com¬ 
prised of 360,000 points. The resulting HT estimator is set to be the average 
of the grid nodes where the maximum of the objective function is achieved. 
Simulation results are given in Table 1. The table presents the values of the 
HT estimates of the parameters (ao,bo) = (1,2) averaged over 1,000 replica¬ 
tions, along with the square root of the resulting mean squared error. Closer 
inspection of the results shows that the mean squared error first decreases 
as r grows, but when r becomes large, an increase in the mean squared error 
is observed. Overall, it seems that the estimation accuracy is relatively stable 
as r varies over a wide range of values. This phenomenon has been consis¬ 
tently observed for various data sets generated from different models. (The 
results described in Table 1 are one such representative example.) Finally, 
we note that, in practice, it may be advantageous to take r slowly tending 
to zero as n —> oo. This might be particularly important in the problem of 
multiple line testing discussed in Section 5. However, analysis of theoretical 
properties of such an estimator is beyond the scope of this article. 

4.2. Equivariance properties and the effect of design variables. We now 
briefly mention some equivariance properties of the HT estimator. In the 
context of regression estimators, different notions of equivariance are con¬ 
sidered [see, e.g., Rousseeuw and Leroy (1987), page 116]. An estimator 9 is 
said to be regression equivariant if 0({Xj, Y) -|- cXi}2^i) = 9{{Xi, Yi}f=i) -|- c, 
where c is an arbitrary constant. It is scale equivariant if 9{{Xi,cY}^=i) = 
c6{{Xi,Yi}'^^i) and affine equivariant if §{{cXi,Yi}2^i) = c~^9{{Xi,Yi}'^^i) 
for c / 0. 

It is easily seen that the HT estimator Or^n is regression equivariant, but 
not scale and affine equivariant. The equivariance properties of the HT es¬ 
timator are clearly intimately related to the Hough template. In particular, 
the template displayed in Figure 2 implies that the estimate treats differ¬ 
ently observations with small and large A-variate values. The straight lines 
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Table 1 

Estimation accuracy of the HT estimator. The numbers in parenthesis are the (slope, 
intercept) estimates, and the value below them is the associated root mean squared error. 
All values are obtained by averaging over 1000 replications 




Sample size 


r 

n = 25 

n = 50 

n = 100 

0.025 

(0.992, 1.981) 

(0.995, 2.009) 

(0.990, 2.009) 


0.407 

0.297 

0.245 

0.04 

(0.999, 1.989) 

(0.997, 2.013) 

(0.995, 2.013) 


0.392 

0.284 

0.231 

0.05 

(1.003, 2.001) 

(0.995, 2.001) 

(1.001, 2.009) 


0.354 

0.272 

0.219 

0.075 

(1.011, 2.007) 

(0.992, 2.008) 

(1.000, 2.009) 


0.322 

0.264 

0.213 

0.1 

(1.009, 2.008) 

(0.996, 2.009) 

(0.998, 2.015) 


0.308 

0.251 

0.204 

0.2 

(1.000, 2.010) 

(0.997, 2.012) 

(1.000, 2.004) 


0.264 

0.208 

0.164 

0.4 

(1.001, 2.010) 

(0.999, 2.007) 

(0.996, 2.003) 


0.220 

0.171 

0.137 

0.5 

(0.996, 2.008) 

(0.995, 2.004) 

(0.994, 2.001) 


0.211 

0.174 

0.135 

0.75 

(1.012, 1.999) 

(1.002, 1.996) 

(0.999, 2.003) 


0.248 

0.209 

0.172 

0.8 

(1.015, 1.997) 

(1.002, 1.997) 

(0.996, 2.002) 


0.254 

0.219 

0.179 


in the Hough domain corresponding to the observations with large Xi values 
are very steep. If the majority of the observations have a large X-coordinate 
and the standard deviation of the noise is small, then the corresponding 
straight lines are nearly parallel. In this case behavior of the HT estimator 
may be quite poor. 

To illustrate the effect of the design distribution, we generate 100 inde¬ 
pendent observations from the model Yi = Xj + 2 + Cj, where e* are Gaussian 
random variables with zero mean and standard deviation 0.5. Figure 4 dis¬ 
plays the perspective plots of the objective function Mo, 3 ,n(^), along with 
the corresponding dual plots for two different design distributions. Fig¬ 
ure 4(a) and (b) corresponds to the explanatory variables Xj uniformly 
distributed on [—2,2], while Figure 4(c) and (d) shows the case of Xj uni¬ 
formly distributed on [20,24]. In the second case the objective function is 
very flat. This leads to a large solution set and high variability of the HT 
estimator. Theoretically, when Xj are large, the matrix Vq appearing in (6) 
is nearly singular because /^(rjjZjj) — /'(—rjjZjj) is close to zero. Therefore, 
the asymptotic distribution of 0r,n is close to the distribution of the point 
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of maximum of a zero mean Gaussian process given in (7). To recapitulate 
this point, the influence of the design distribution on estimation accuracy 
suggests that it would be reasonable, in practice, to center the explanatory 
variables before applying the HT estimator. We note that in computer vi¬ 
sion applications this does not typically pose a problem as the measurement 
units used for the X-coordinate are image-independent. 

4.3. Related regression methods. The HT estimator may be viewed as a 
counterpart to an S-estimator [cf. Rousseeuw and Yohai (1984) and Davies 
(1990)]. Indeed, fix <5 G (0,1) and consider the following optimization prob- 



Fig. 4. Perspective plots of Mr,n{d) along with the corresponding dual plots: (a), (b) 
Xi are uniformly distributed on [—2,2]; (c), (d) Xi are uniformly distributed on [20,24]. 














12 


A. GOLDENSHLUGER AND A. ZEEVI 


lem: 

V{6): min r 

e=(a,6)eM2 

(8) 1 ” 

s.t. Mr,n{e) = -Y.l{\Yi - aXi - 6|2 < r\xf + 1)} > 1 - <5. 

Tl . . 
t=l 

Solution of (8) defines the 5-estimator 9s^n whose replacement breakdown 
point equals £rep{Ss,n',yn) = min((5,1 — <5) [cf. Davies (1990)]. The LMS esti¬ 
mator, see Rousseeuw (1984), can be written in a form similar to (8). In this 
specihc case 5 = n“^([n/2j -|- 1) and Xf -|- 1 on the right-hand side should 
be replaced by 1. Recall that, by definition, the HT estimator 9r,n solves the 
following optimization problem: 

1 ” 

Q(r): M,,„(0) = - ^ l{|y, - aX, - b\^ < r\x‘f + 1)}. 

Then the connection between the HT estimator and the 5-estimator (8) is 
as follows. For a given 5 > 0, let r = val('P(5)), where val(-) is the value 
of the optimization problem, and let be the solution to V{5). Then, 
clearly val(Q(f)) >1 — 5, and 9f^n belongs to the solution set of ^*((5). Thus, 
with this particular choice of r, the HT estimator and the corresponding 
5-estimator are identical; in particular, erep(^f,n; 3^n) = min(5,1 — 5). 

5. Multiple line detection. In practice, the Hough domain is discretized 
into cells, and the number of lines crossing each cell is counted. Next, each 
of the cells is examined to search for “high counts.” In particular, cells with 
counts exceeding some predetermined threshold correspond to “detected” 
lines in the original space. This procedure amounts to an exhaustive search 
for local maxima (threshold crossings) in the Hough domain. Thus, in con¬ 
trast to other line fitting procedures, the HT is used to estimate several 
lines simultaneously. It should be noted, however, that points of local max¬ 
ima do not necessarily correspond to actual line parameters. Consequently, 
in the case of multiple lines it is more accurate to view the HT as a tool for 
testing or detecting the presence of straight lines in images. This has also 
been recognized in the computer vision literature [cf. Princen, Illingworth 
and Kittler (1994)]. 

In view of the above, one can view the multiple line detection problem 
using the HT as testing for multi-modality in the Hough domain. Testing 
multi-modality is a subject of vast literature. This problem is characterized 
by the fact that only one-sided inference is possible [see, e.g., Donoho (1988)], 
that is, the only verifiable hypotheses are of the type “there are at least three 
lines in the image.” The most appropriate approach for our purposes is based 
on the concept of excess mass [see Hartigan (1987), Muller and Sawitzki 
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(1991) and Polonik (1995)], which is typically used in the “mode testing” 
problem. In the context of the HT, this excess mass corresponds to regions 
in the parameter space (Hough domain) where large counts are present. 


5.1. Excess mass functionals. Let {Xi,Yi),... ,{Xn,Yn) be a sample of 
i.i.d. random variables, and, for r > 0 and 9 = {a,b) G M^, let Mr^n{9) and 
Mr{6) be given by (1) and (3), respectively. We stress that {Xi,Yi ),..., (X„, Yn) 
are not assumed to be drawn from the linear model (5). Throughout this 
section we suppose that parameter 9 is confined to a compact set ©o C 
The excess mass functional is defined by 


E{X) := 


f9)-X)+d9 


= [ Mr{9)d9-XC{ex}, 
Jex 


where (x)"*" := max(0,x), ©a := {9 G :Mr{9) > A}, and £{•} stands for 
Lebesgue measure in We call ©a the X-level set; note that ©a is closed 
and bounded because Mr{-) is continuous. For a compact set © C and 
A G (0,1), let us define 


i7;^{0};= [ Mr{9)d9-XC{e}. 

Je 

Then E{X) = sup{iLA{©}: © C compact}. The empirical version of the 
excess mass functional is obtained by substituting Mj,^„(-) for Mr{-) in the 
definition, namely, 


En{X) ■■= 


,{9)-X)+d9 


lex 


Mr,n{9)d9-XC{ex,n}, 


where ©A,n = {6* G :Mr^n{9) > A} is the empirical X-level set. Using the 
notation 

HA,n{©}:= [ Mr,n{9)d9-XC{e}, 

Je 

we have that En{X) = sup{LfA,n{©} : © C M^, compact}. Note that the em¬ 
pirical A-level set ©A,n is a closed subset of ; this follows from the fact that 
Mr,n: > [0,1] is upper semi-continuous [see, e.g., Rudin (1987), pages 

37 and 38]. Since the parameter 9 is assumed to take values in the compact 
set ©0, ©A,n is also bounded. 
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Following Polonik (1995), we also consider the excess mass functional over 
some classes of subsets in Let T be a class of compact subsets of 
The excess mass functional over T at level A G (0,1) is given by 


Et{^) '■= supILTaI©} : 0 G T} = sup 

eer Ue 


Mr{e)d9-\C{Q} 


Every set 0\(T) G T satisfying E'-r(A) = JI\{0\(T)} is called the X-level set 
in T . Clearly, Er{\) < E{\) and E't{X) = E{X) if ©a G T . The empirical 
version Eq-,n{X) of Er{X) is defined by 

ET,n{X) := sup{Fa,„{0}:0gT} 

= [ Mr,n{0)dB-XC{Gx,n{r)}, 

where Qx,n{E) is the empirical X-level set in T . 

We stress that the excess mass approach is very natural in the context of 
the HT. In particular, the value of En{X) conveniently quantifies the total 
sum of counts corresponding to cells with counts exceeding A. Consequently, 
asymptotic behavior of the empirical excess mass functional is of interest. 

5.2. Asymptotics of the empirical excess mass functional. The asymp¬ 
totic behavior of the empirical excess mass functional is the key building 
block in a statistical procedure for detecting multiple lines; this is given in 
the next theorem. To that end, let us denote 

i/„(A):=a/^/ [Mr,n{0)-Mr{9)]de, AgA:=[A,A] C(0,1), 


and let 1°°{A) denote the space of all uniformly bounded real-valued func¬ 
tions over A. 


Theorem 4. Suppose that [0,1] satisfies 


(9) 


Then: 


lim sup£{{0: \Mr{9) — A| < 5}} = 0. 
< 5^0 agA 


(i) supAgA |A/n[E„(A) - E(A)] - i/„(A)| = Op(l) asn^oo, and 


( 10 ) 


t'n(A) 


' G{9)d9 in —!■ oo, 

where G{-) is a zero mean Gaussian random field with covariance kernel 
]E[G(0G'(7?)] = ¥{\Z^( -Y\< r\\Z\\,\Z^ri - T] < r||Z||} 

-¥{\Z'^f,-Y\<r\\Z\\]¥{\Z^p-Y\<r\\Z\\], 
where Z = {X, 1)'^ and ry G . 


( 11 ) 
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(ii) Let T denote the class of compact subsets of such that G T 
for every A G A. Then 

sup| V«[^r,n(A) - ^(A)] - z^n(A)| = Op(l), n^oo, 

AeA 

and (10) holds. 

The asymptotics of the empirical excess mass functional are determined 
by two factors: the asymptotic behavior of the random field Mr^n{G) and the 
asymptotic behavior of the (random) level set 0A,n- There are essentially 
two main ideas that underlie the proof: (i) the class of sets generated by 
the Hough template, V = {Dq : 9 G is a separable VC class of sets, and, 
thus, a uniform central limit theorem holds for the random field Mj.^„(-) 
[cf. Proposition 2]; (ii) under assumption (9), which essentially posits that 
the deterministic held does not have “hat parts,” the convergence of 

the random held also implies convergence of the associated (random) level 
sets to their deterministic counterparts. In the absence of assumption (9), 
difficulties can easily arise in “mode testing” [see Muller and Sawitzki (1991) 
and Polonik (1995), where a similar condition is imposed in the context of 
excess mass testing for modes of a distribution]. 

5.3. Testing for multiple lines. We now sketch how Theorem 4 may be 
used for detecting multiple lines in some specihc cases. To illustrate the 
ideas, consider the following hypothesis test: 

(12) Hq: one line vs. Hi: more than one line. 

The rigorous interpretation of the above is that “under the null hypothesis,” 
the data is generated by the model (5) with some unknown 9q = (oq, bo), and 
assumptions (a) and (b) of Section 3 hold. To characterize the behavior of 
excess mass functionals under the null hypothesis, we will need the next 
result which essentially states that under Hq the A-level set ©a for A G A is 
a convex set which is balanced around Oq = (oq, bo). 

Proposition 1. Assume that the data are generated by the model (5), 
and assumptions (a) and (b) hold. Then M^iO) = Mj.{9 — 6 q) for some func¬ 
tion Mr{-) which is symmetric near zero with unique mode at 6 = 0. In 
addition, the set Q\ = {6 £ :Mr{6) >X] is a closed convex and balanced 

set (i.e., if 9 £ Q\, then —9 £ Qx). 

First consider the testing problem under the assumption that the dis¬ 
tributions of e and X are known. Suppose that AA(-) has no “flat parts,” 
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that is, (9) holds. By Proposition 1, under Hq the excess mass functional 
E{X) is completely specified and given by 

E,{X) = j {Mr{e)-X)+de 

= J{Mr{9) - X)^ d9 

= Ji¥{\e + Z^9\<r\\Z\\}-X)^d9. 

Thus, (12) reduces to testing 

H'o:E{X) = E^{X) VAgA vs. H[: E{X) ^ E^{X) for some A G A. 

It follows from Theorem 4(i) that 

Th := Vnsup\En{X) -P;*(A)| ^ x, 
aga 

where x := sup;^^^ I/q^ G{9) d9\. Observe that ©a = 0a + 9o, hence. 


X = sup 

/ G{9 - 9q) d9 

= sup 

/ G{9) d9 

aga 

JBx 

agA 

JBx 


where G{-) = G{- — 9o). We note that the covariance kernel of the zero mean 
Gaussian process G'(-) := G{- — 9o) does not depend on 9o and is given by (11) 
with Y replaced by e. Thus, the test can be based on the statistic whose 
asymptotic distribution does not depend on unknown parameter 9o, and is 
completely specified under Hq, provided that the distributions of e and X 
are known. Such a test will be consistent against all alternatives of the type 
|£'(A) — ii'*(A)| > 0 for some A G A. We note that although the assumption 
that the distributions of e and X are known may seem to be restrictive, it 
is quite typical in many application settings [see, e.g., Princen, Illingworth 
and Kittler (1994)]. 

If the distributions of X and e are unknown, cannot be computed 
and, therefore, testing the presence of one line against multiple lines is more 
complicated. In this setting one can pursue the multiple line testing problem 
by comparing restricted and unrestricted empirical excess mass functionals. 
Proposition 1 states that under the null hypothesis, the A-level set ©a is 
convex and balanced around 9q. Therefore, the test may be based on com¬ 
paring En{X) with the empirical excess mass £'c,n(A) over the set C of all 
compact convex subsets of M^. Thus, we consider testing 

Hq:Qx£ C V A G A vs. C for some A G A. 

In view of Theorem 4, anaturaltest statistic is Tl^ := ■y/nsupAgAl-^n(A) — ii^c,n(A)|, 
and Hq should be rejected for large values of T^. Under Hq, = Op{l) 
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as n —> oo. On the other hand, if E{X) — Ec{X) > 0 for some A e A, then 
by Theorem 4 the power of the test based on converges to 1 as n —> oo. 
Thus, the described test is consistent against all alternatives of the type 
E{X) — Ec{X) > 0 for some A G A. Unfortunately, the limiting distribu¬ 
tion of is not available; in general, it depends on the rate at which 
supAeA>C{{0: |Mr(0) — A| < <5}} goes to zero as <5 —> 0 [cf. (9)]. We note 
that even though the condition E{X) — Ec{X) > 0 does not imply that 
©A 7 ^ 0a(C), in many situations this is the case. 


6. Concluding remarks. 


1. The HT estimator can be used in the multiple regression context. 
Assume the model 


V 

^ Pk^k + e, 


k=l 


and denote 9 = {Pi ,..., /9p)^ and Z = {Xi ,..., Xp)"^ . Then the HT estimator 
is defined by 


(13) 


1 

arg max — 
eeRp n 


i=l 


It can be easily seen that Theorems 1-3 hold for the multiple regression 
setup with obvious modifications. In particular, the breakdown point given 
in Theorem 3 does not depend on the dimension. Unfortunately, the maxi¬ 
mization problem in (13) is difficult and cannot be solved as easily as in the 
two-dimensional case. 

2. The slow, cube root, convergence rate of the HT estimator is a con¬ 
sequence of the discontinuous objective function. Kim and Pollard (1990) 
study this phenomenon and survey various estimation settings in which cube 
root convergence rates govern the asymptotics. To this end, the original 
objective function might be approximated by a smooth function, and the 
resulting modified “smoothed” estimator would have standard ^/n asymp¬ 
totics and “good” breakdown properties. In this case maximization of the 
objective function can be pursued using a gradient-based search. 

3. A variety of modified estimators may be obtained using different 
cell shapes in the Hough domain. For example, a vertical line segment of 
length 2r as a cell shape in the Hough domain corresponds to an estimator 
which maximizes 

n “ 

2 = 1 

over 6 G'MP. The template of this estimator represents a strip of width 2r 
measured in the vertical direction. Such an estimator can be viewed as a 
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counterpart to the LMS estimator. The properties of the estimator are quite 
similar to those of the HT estimator. In addition, such an estimator is scale 
and affine equivariant. 

4. Fitting a straight line when both variables are subject to random 
errors can be treated using the described techniques. For example, it can 
be easily shown that the estimator based on the vertical line-segment cell is 
consistent, provided the errors have symmetric strongly unimodal densities. 

APPENDIX A: PROOFS FOR SECTION 3 

Proof of Theorem 1. Conditioning on X, we have for 0 / do, 

lK[Mr,n{0)\X] = P{|Xa + b-Y\‘^< r^{X^ + 1)|A} 

= r{-rVx^ + 1 - X{a - ao) - (b - bo) 

<-e< rVx^ + 1 - X{a - ao) - {b - bo)\X} 

< F{-rVx^ + l <-e< rVx'^ + l\X}. 

The last inequality is a consequence of the Anderson lemma [Anderson 
(1955)] and the fact that / is symmetric and strictly unimodal. Hence, 9o is 
a unique point of maximum of function Mj.{9) := for any r > 0. In 

particular, denoting by B^{9o) the ball of radius e with center 9o, we have 
that for any s > 0, 

(14) max Mr(9) < Mr{9o)- 

9eB-ieo) 

The point of maximum of Mj.{-) is, thus, unique and well separated. 

Consider the class of sets V = {Dg,9 gE?}, where Dg is dehned in (4). 
This class has polynomial discrimination, that is, it is a Vapnik-Cervonenkis 
(VC) class of sets [see Pollard (1984), Definition 11.13, or van der Vaart 
and Wellner (1996), page 85]. Indeed, as was mentioned before, D is a class 
of subsets of the plane generated by a linear space of quadratic forms. Hence, 
by Lemma 11.18 in Pollard (1984), T) has polynomial discrimination. Note 
also that T) is universally separable in the sense of Pollard [(1984), page 38]. 
[This follows straightforwardly from Pollard (1984), page 38, problem 4.] 
Therefore, we conclude that the random variable supg \Mr^n{G) — Mr{G)\ is 
measurable. Now, Theorem 11.14 from Pollard (1984) implies that 

(15) SUp\Mr^n{d) — Mr{9)\ = sup \Fn{D) —F{D)\^ 0 a.s. 

8 ’ Dev 

Further, write 

Mr{9r,n) - Mr{9o) 
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= Mr{9r,n) “ Mr^n{9r,n) + Mr^n{9r,n) “ Mr{6o) 

< SUp\Mr{9) - Mr,n{9)\ +Mr,n(^r,n) “ Mr{9r,n) 

9 

<2sup\Mr{9) - Mr,n{9)\. 

9 

Hence, (15) implies 

(16) \Mr{9r,n)-Mr{9o)\^0, 

almost surely, as n —> oo. Fix e > 0. Then by (14) there exists a <5 > 0 such 
that max 5 )g^c( 0 jj) Mr(9) < Mr{9o) — 6. Consequently, we have the set inclu¬ 
sion 

{9r,r^ G Bl{9o) i.O.} C {Mr{9r,r^) < M,(0o) " 5 i-O.}. 

But (16) implies that the probability of the event on the right-hand side 
is zero. Thus, we conclude that {9r^n G B^{9o) ev.} occurs with probability 
one. Since e > 0 was arbitrary, we have that 0r,n —> 9q, almost surely, as 
n ^ oo. This concludes the proof. □ 

Proof of Theorem 2. The proof is based on verifying conditions of 
the main theorem of Kim and Pollard (1990) [cf. also Theorem 3.2.10 in 
van der Vaart and Wellner (1996)]. 

Let V(9) denote the second derivative matrix of the function 

Mr{9)=F{\Xa + b-Y\ + 1}F{\Z'^9 -Y\ <r||Z||}. 

Write 

(17) Mr{9) = E[F(r||Z|| - Z^{9 - 9o)) - F{-r\\Z\\ - Z^{9 - 0o))], 

where F is the distribution function of e, and the expected value above is 
taken w.r.t. the distribution of Z := (X, 1)^. Now, recall that / is assumed to 
be continuously differentiable with bounded derivative, and that < oo. 
Therefore, we can apply the dominated convergence theorem to interchange 
the order of expectation and differentiation for the expression on the right- 
hand side of (17). In particular, (17) can be differentiated twice w.r.t. 9 
under the integral sign, yielding 

V{9) := VjMr{9) 

= E{[/'(r||Z|| -{9- 9ofZ) - fi-r\\Z\\ -{9- 9ofZ)]ZZ'^]. 

Let Vo = ^(^o)- Note that the matrix Vq is negative definite when X is 
nondegenerate. This follows because for strictly unimodal symmetric densi¬ 
ties /, f'{x) — f'{—x) < 0 for all a; > 0 , and under the premise of the theorem, 
FZZ'^ is positive definite. 



20 


A. GOLDENSHLUGER AND A. ZEEVI 


For (5 > 0 consider classes of functions Ais = {^e — rnoo “ ^o|| < <5}, 
where mg = lDg, and Dg is defined in (4) . These classes have polynomial dis¬ 
crimination, that is, they are VC classes [see Pollard (1984), Definition 11.13, 
or van der Vaart and Wellner (1996), page 85] with envelope functions 


Ms = sup 

||0-0o||<5 


1<^ -r < 


z^e-Y 


z 


< r > — 1< —r < 


Z^On - Y 


Z 


< r 


< 1< —r — 6 < 


Z^ea - Y 


z 


<—r + S> + l<r — S< 


z^0o - r 

lizii 


<r + S>. 


Therefore, for small 6, 


EM| < -r - 5 < 


<—r + h>-|-P<r — 5< 


Z 


<r + S 


< cS := C(/>^(5) 


for some positive constant c. This verifies condition (vi) in Kim and Pollard 
[(1990), Theorem 1.1], namely, that EM| = 0(6). Thus, we anticipate that 
j.j-i /3 jg which convBrges to Oq. To arrive at a rigorous conclu¬ 

sion, the key is to compute E(m 0 (,+ 5 ^ — mg^^srj)'^ for fixed <5 > 0 and ^,rj gE?. 
This behavior, together with the order of <i)(6), will also determine the struc¬ 
ture of the increments of the limiting Gaussian process asserted in the the¬ 
orem. To that end, note that 

E[(meo+5g - < Z'^r]}] 

= eJ f{x)l{xG[r\\Z\\+6Z^C,r\\Z\\ + 6Z^r]]}dxl{Z^^<Z^r]} 

+ J f{x)l{xG[r\\Z\\ + 6Z^C,r\\Z\\ + 6Z^rj]}dxl{Z^^<Z'^r]} 

= E[F(-r\\Z\\+SZ'^r]) - F{-r\\Z\\+SZ'^^);Z'^C<Z^r]] 

+ E[F{r\\Z\\ + 6Z^r]) - F(r\\Z\\ + 6Z^^y, Z^C < Z^r]] 

= : Ti + X2. 


Similar expressions hold when the above expectation is taken on the event 
> Z'^vj}, with ^ replaced by r] and vice versa. Our objective is to 
evaluate an expression for 

E(meo+5^ - meo+5^)2 

™ <(.2(5) 

But, since 4>(6) this amounts to differentiating E(mgQ^s^ — meo+^r;)^ 

w.r.t. 6 under the integral. (This interchange is justified since /, the density 
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of e, is assumed to be bounded, and Z has finite second moment.) Given the 
above expressions for X\ and X 2 , straightforward algebra yields 

Mm ^ lE{[/(_r||Z||) + /(rllZIDlIZ^K - ,)|} 

= 2E{/(r||Z||)|Z^ ({-,)!). 

This completes the proof. □ 

Proof of Theorem 3. Under the premise of the theorem, there are 
no parallel lines Lj in the Hough domain. In other words, any pair of ran¬ 
dom lines intersect, and there is a closed ball of finite radius that con¬ 
tains the set of all intersection points. By construction, for fixed n, 6 = 
is the center of the ball of radius r that crosses over the maximal num¬ 
ber of random lines Lj in the parameter space. Of course, nMj.^n{Gr,n) is 
the corresponding number of such lines. Clearly, in order to shift this es¬ 
timate to infinity one should add at least nMr^n{(^r,n) — 1 lines at infinity. 
Thus, the smallest contamination fraction under which breaks down is 
{nMr^n{0r,n) — + nMr^n{&r,n) — !)• Applying the argument as in the 

proof of Theorem 1, we conclude Mr^n{^r,n)^ = P{e^ < r^||Z|p}, 

and the result for Sadd(0r,n; 3^n) follows. For the replacement breakdown 
point, it is sufficient to note that under the premise of the theorem at least 
YnMr^n{dr,n)/2\ lines should be replaced. The proof is complete. □ 

APPENDIX B: PROOFS FOR SECTION 5 

First we state the uniform central limit theorem for the random field 
Mr^ni') alluded to before. The statement is formulated in terms of the class 
of sets generated by the Hough template. 

Proposition 2. Let V = {me = : 0 e IR^}, where Dg is defined in (4). 

Let denote the set of all uniformly bounded real functions on T). Then 

the class V is F-Donsker, that is, ^/nlFn — P) ^ Gp in where the 

limit process {Gpmg :mg G 2?} is zero mean Gaussian with covariance function 

(18) F[Gpm^Gpmri] = F{D^ 0 D^) — F{D^)F{Drf). 

The proposition follows from the uniform central limit theorem for mea¬ 
surable VC-classes [e.g.. Corollary 6.3.17 in Dudley (1999)]. Through the 
mapping 6 1 —> Dg, the weak convergence in implies that ^Jn{Mr^n{G) — 

Alr{0)) ^ G(-), where denotes weak convergence in 2°°(]R^), and the 
limit is a zero mean Gaussian process with covariance function induced 
by (18). 
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Proof of Theorem 4. First we prove the statement given in part (i) 
of the theorem. The proof proceeds in two steps. 

Step 1. We will require a notion of convergence of sets (all sets are 
members of the Borel cr-field over M^). For any two sets Ai ,A 2 , let A 1 AA 2 := 
{Ai \ ^ 2 ) U {A 2 \ ^ 1 ) be the symmetric difference, and dehne 

d{Ai,A2) :=sup£{(^iA^2)nSfc}, 
fe>i 

where T{-} stands for Lebesgue measure in and Sfc = {0 G : ||0|| < k}. 
Note that the above supremum is always finite due to the compactness 
assumption of the parameter space. First, we prove that 

(19) supd(0A,0A,n) ^ 0 a.s., 

AeA 

as n —> 00 [we refer to Molchanov (1998) for closely related results]. For 
brevity, let us denote AA,n := 0AA0A,n- Fix 6 > 0. We start with the de¬ 
composition 

d(0A, 0A,n) = C{Ax,n H {6 : |M,(0) - A| < <5}} 

+ £{AA,„n{0:|M,(0)-A|><5}}. 

The first term on the right-hand side is dominated by C{{0: \Mr{6) — A| < 
(5}}. The second term on the right-hand side can be upper bounded using 
the Markov inequality as follows: 

£{Aa,„ n {6 : \Mr{e) - A| > 5}} < <5-^ [ \Mr{e) - A| de 

<(5“^£{AA,n} sup \Mr{6)-X\. 

Now, for sufficiently large n (not depending on the choice of A) we have 
(a.s.) the set inclusions 

AA,n = {0:Mri9) > X,Mr,n{9) < A} U{0:M,(0) < X,Mr,n{0) > A} 

C {9 : Mr{e) > A, Mr{e) <X + r]n}U{e: Mr{9) < A, Mr{9) > A - 
C {9:\Mr{0) -X\< rjn}, 

where 


rjn := sup \Mr,n{d) “ Mr{9)\ 

and does not depend on A. It follows that 

sup |Mr(6') - A| < ry„. 
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In particular, we have for sufficiently large n (independent of A) that 
d{&X,Qx,n) = -C{AA,n} 

<C{{e-.\Mr{9)-X\<r,n}} 

and the bound on the right-hand side is uniform in A. Thus, taking the 
supremum over A £ A, letting n —> oo and appealing to condition (9), we 
obtain the asserted asymptotic (19). 


Step 2. We now show that for all A £ A, 

(20) ^/n{En{X) - E{X)) = Un{X)+ Op{l), n^oo, 
where Op(l) is uniform in A £ A. First, observe that 

En{X) - E{X) = f (Mr^niO) - X) dO - f (M,(0) - A) de 

d&X,n •'©A 

( 21 ) 


where 


( 22 ) 

Now, 


Rn : = 


:= f {Mr,n{e)-X)de- j {Mr,n{e)-X)de 

■^©A,n\©A ^ 


©A\©A.n 


\\/nRn\<V^ \Mr,n{d) - X\d0 

J ©aAOa.to 

< d(0A,0A,n)\/n sup |AA,„(6') - A|. 
0GeAAeA,„ 


To prove that \^/nRn\ = Op(l), it suffices to prove this for the right-hand 
side above. To see this, recall from Step 1 that 


sup \Mr{9) — A| < sup \Mr^n{9) — Mr{9)\, 
eseAAeA,™ e 

where the upper bound does not depend on A. Consequently, we have that 

\xfnRn\ < d{Qx,Qx,n)xfns\XY)\Mr,n{9) -Mr{9)\. 

e 

But it follows from Proposition 2 that 

x/n sup I Mr^n{&) — Mr (6*)| ^ sup \G{9)\, 

9 ’ 9 

where G{-) is the zero mean Gaussian process identified in Proposition 2 
and the discussion following thereafter, and the above supremum is hnite, 
almost surely. Note that the weak limit does not depend on A. By Step 1 
we have that supa£a^(®A) &x,n) —> 0 as n —> oo, a.s. Finally, using Slutzky’s 
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lemma, we have that ^/nRn = Op(l) uniformly in A. This result, together 
with (21), gives the assertion (20). 

Finally, we put the pieces together using the continuous mapping theorem 
in the space of continuous functions [see, e.g., Billingsley (1968)], which 
yields that VnW converges to the corresponding integral of the process G(-). 
To that end, we note that the mapping A i—> 0;^ is continuous w.r.t. the 
metric d, because Mr{-) is continuous and (9) holds. This concludes the 
proof of the first statement of the theorem. 

The proof of statement (ii) goes along the same lines as above. We indicate 
only the differences. Note that Er{X) = E{X) because 0 a G T. Also, by 
definition of 

(23) ^A,n{0A} < Hx,n{Qx,n{'^)] < ^A,n{0A,n}. 

Therefore, similarly to (21), we write 

ET,n{^) ~ E{X) = r'n(A) + Rn-i 

where 

Rn := [ {Mr,n{0) - \) dO - [ (M,,„(0) - A) de 

JexAV\&x dex\ex,r,{r) 

= Hx,n{&X,niV} - HxA^x} 

< H\,n{'3>\} - H\,n{'3>X,n} = Rn, 

the last inequality follows from (23) and Rn is defined in (22). Thus, \x/nRn\ 
is bounded using the bounds on \x/nRn\ above. Other details of the proof 
remain unchanged. □ 

Proof of Proposition 1. It follows immediately from the definition 
that Mr{9) = Mr{9 — 9o), where 

Mr{9)=F{\e + Z^9\<r\\Z\\} 

= E[F(r||Z|| - Z^9) - E{-r\\Z\\ - Z^9)]. 

By symmetry of /, 

F(r||Z|| - Z'^9) - F{-r\\Z\\ - Z'^9) 

= F{r\\Z\\ + Z'^9) - F{-r\\Z\\ + Z^9) VZ, 

and, therefore, Mr{9) = Mr{—9) V0. Uniqueness of the mode follows from 
the Anderson lemma. 

Let 01,02 G ©A) that is, Mr{9i) > A and Mr(02) > A. Let 0* = a0i + (1 — 
a)02 for some a G (0,1), and denote R = [—r||Z|| — Z^0i,r||Z|| — Z^0i], 
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l2 = [-r\\Z\\- Z^e2,r\\Z\\- Z^e2], andh = [-r\\Z\\-Z^e^,r\\Z\\-Z^e^]. 

With this notation, 

Mr(0*)=E [ f{x)dx. 

Jh 

The lengths of /i, /2 and /* are equal to 2r||Z||. However, since min{Z^0i, Z'^02} < 
Z'^6^, < max{Z"^9i, Z^92}, the center of /* is closer to the origin than one 
of the centers of Ii and l 2 - Therefore, by symmetry and unimodality of /, 
for all Z, 

Afr(0*)=IE [ /(x)dx > Emin/ [ f(x)dx, [ /(x)dxl>A. 

Jh U/i J12 J 

Thus, 0* e ©A, and ©a is convex. □ 
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